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Introduction 


The  genetic  changes  that  cause  breast  cancer  remain  incompletely  understood. 
HERV-K  is  an  endogenous  retrovirus  of  humans  that  is  related  to  MMTV,  the  mammary 
carcinogenic  virus  of  mice,  and  type-D  primate  retroviruses.  All  humans  inherit  about 
30  full-length  HERV-K  proviruses  (retrovirus  DNA  genomes  integrated  into  the  DNA  of 
the  host)  in  germline  DNA  from  their  parents.  Many  of  these  formed  in  recent  human 
evolution,  after  the  human  and  chimpanzee  lineages  diverged  (1).  This  observation  led 
us  to  hypotesize  that  HERV-K  may  still  be  capable  of  reinfecting  humans  today.  Any 
retrovirus  that  can  replicate  is  inherently  mutagenic  and  is  therefore  worth  considering 
as  an  agent  of  genetic  changes  that  might  lead  to  diseases  such  as  cancer.  Given  the 
relationship  to  MMTV,  a  virus  that  causes  breast  cancer  in  mice,  we  proposed  to 
examine  human  breast  cancer. 

Body 


Most  oncogenic  retroviruses,  including  MMTV,  cause  tumors  by  insertion  of  the 
viral  genome  adjacent  to  cellular  oncogenes  (2,  3).  The  viral  DNA  genome,  an  essential 
intermediate  in  viral  replication,  inserts  at  random  positions  within  the  host  cell  genome 
to  form  proviruses.  If  the  virus  genome  inserts  itself  adjacent  to  an  appropriate  cellular 
oncogene  and  is  capable  of  affecting  the  expression  of  that  gene  in  a  manner  that 
contributes  to  oncogenic  transformation,  that  specific  infected  cell  can  proliferate 
clonally  to  form  a  tumor.  The  key  signature  of  such  an  event  is  the  presence  of  the 
provirus  in  all  cells  of  the  tumor. 

Therefore  we  developed  a  robust,  inverse  PCR-based  assay  (4)  to  search  the 
genomes  of  primary  human  breast  carcinomas  for  the  presence  of  somatically  acquired 
HERV-K  proviruses.  The  PCR  assay  uses  primers  in  the  viral  genome  to  amplify  all  the 
HERV-K  proviruses  in  the  human  genome,  with  each  provirus  yielding  a  different  length 
fragment.  The  products  are  separated  by  gel  electrophoresis  in  a  single  lane  of  a  gel. 

By  comparing  tumor  and  normal  (germline)  DNA  from  the  same  individual,  we  can 
search  for  the  presence  of  somatically  acquired  proviruses  in  tumor  samples.  Analysis 
of  17  such  samples  failed  to  detect  any  such  proviruses. 

The  comparison  of  HERV-K  proviruses  in  different  individual  humans  led  us  to 
the  realization  that  some  HERV-K  proviruses  were  in  some  individuals  but  not  in  others 
(5).  We  confirmed  these  findings  with  additional  assays  and  performed  an  initial  screen 
of  the  allele  frequencies  and  distribution  among  genetically  diverse  humans.  In 
addition,  we  found  that  a  few  of  the  HERV-K  proviruses  in  humans  have  full-length  open 
reading  frames  for  all  viral  proteins.  A  publication  describing  this  work  is  included  in  the 
appendix. 

The  biggest  difficulty  that  we  encountered  was  the  ability  to  acquire  matched 
tumor  and  normal  DNA  samples,  specifically  the  latter,  from  single  individuals.  We  have 
not  yet  examined  nearly  as  large  a  number  of  tumors  as  we  had  initially  planned. 
However,  our  examination  of  these  tumors  is  continuing. 
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Key  Research  Accomplishments 

No  evidence  for  HERV-K  infection  of  human  breast  carcinomas  was  obtained  so  far. 

The  first  detection  of  insertional  polymorphisms  of  full-length  endogenous  retroviruses  in 
humans  was  made. 

The  first  detection  of  human  endogenous  retroviruses  with  apparently  intact  open 
reading  frames  for  all  viral  proteins  was  made. 

Reportable  Outcomes 

Turner,  G.,  Barbulescu,  M.,  Su,  M.,  Jensen-Seaman,  M.I.,  Kidd,  K.K.,  and  Lenz,  J. 
(2001)  Insertional  polymorphisms  of  full-length  endogenous  retroviruses  in  humans. 
Current  Biology  11:1 531  -1 535. 

Conclusions 

No  evidence  for  HERV-K  infection  of  human  breast  carcinomas  was  obtained  so 
far,  although  the  studies  are  continuing.  Insertional  polymorphisms  of  full-length  HERV- 
K  proviruses  and  HERV-K  proviruses  with  full-length  open  reading  frames  for  all  viral 
proteins  exist  in  humans  today.  These  findings  are  important  because  they  show  that 
HERV-K  replicated  so  recently  in  human  evolution  that  the  proviral  alleles  are  not  yet 
genetically  fixed  in  the  human  genome,  and  some  of  the  genomes  do  not  appear  to 
have  acquired  debilitating  mutations  over  evolutionary  time.  They  strongly  support  the 
hypothesis  that  HERV-K  is  capable  of  reinfecting  humans  today. 
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Insertional  polymorphisms  of  full-length  endogenous  retroviruses 
in  humans 

Geoffrey  Turner*,  Madalina  Barbulescu*,  Mei  Su*, 

Michael  I.  Jensen-Seaman^,  Kenneth  K.  Kidd+  and  Jack  Lenz* 


Human  endogenous  retrovirus  K  (HERV-K)  is 
distinctive  among  the  retroviruses  in  the  human 
genome  in  that  many  HERV-K  proviruses  were 
inserted  into  the  human  germline  after  the  human 
and  chimpanzee  lineages  evolutionarily  diverged  [1, 
2].  However,  all  full-length  endogenous 
retroviruses  described  to  date  in  humans  are 
sufficiently  old  that  all  humans  examined  were 
homozygous  for  their  presence  [1].  Moreover,  none 
are  intact;  all  have  lethal  mutations  [1,  3,  4].  Here, 
we  describe  the  first  endogenous  retroviruses  in 
humans  for  which  both  the  full-length  provirus  and 
the  preintegration  site  alleles  are  shown  to  be 
present  in  the  human  population  today.  One 
provirus,  called  HERV-K113,  was  present  in  about 
30%  of  tested  individuals,  while  a  second,  called 
HERV-K1 15,  was  found  in  about  15%.  HERV-K1 1 3 
has  full-length  open  reading  frames  (ORFs)  for  all 
viral  proteins  and  lacks  any  nonsynomymous 
substitutions  in  amino  acid  motifs  that  are  well 
conserved  among  retroviruses.  This  is  the  first  such 
endogenous  retrovirus  identified  in  humans.  These 
findings  indicate  that  HERV-K  remained  capable  of 
reinfecting  humans  through  very  recent 
evolutionary  times  and  that  HERV-K113  is  an 
excellent  candidate  for  an  endogenous  retrovirus 
that  is  capable  of  reinfecting  humans  today. 
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Results  and  discussion 

BACs  containing  full-length  HERV-K  provirus  clones 
were  isolated  as  previously  described  [1]  by  screening 
filters  from  the  human  RP11  BAC  library  (BACPAC  Re¬ 
sources)  with  a  hybridization  probe  from  the  HERV-K 
pol  gene.  Since  each  provirus  is  inserted  at  a  different 
position  in  the  human  genome,  the  DNA  sequences  flank¬ 
ing  each  provirus  are  unique  to  that  particular  provirus. 
The  sequences  immediately  flanking  both  sides  of  each 
cloned  provirus  were  determined.  Two  proviruses  identi¬ 
fied  in  this  manner  were  called  HERV-K113  and  HERV- 
K115.  A  BLAST  search  of  the  nr  and  htgs  databases  in 
GenBank  with  the  sequences  flanking  these  proviruses 
identified  only  entries  corresponding  precisely  to  the 
empty  or  preintegration  site  alleles  that  lack  HERV-K 
sequences  (Figure  la).  The  sequences  were  localized  on 
the  sequence  assembly  of  the  Human  Genome  Project 
Working  Draft  at  UCSC  for  HERV-K1 13  at  chromosome 
19pl3.ll  and  for  HERV-K115  at  chromosome  8p23.1. 
Proviral  insertions  were  associated  with  duplication  of  a 
target  sequence  of  6  bp  (CTCTAT)  for  HERV-K1 13  and 
5  bp  (CCTTT)  for  HERV-K115.  Both  5  and  6  bp  duplica¬ 
tions  have  been  observed  previously  for  HERV-K  [1].  In 
addition,  2  bp  were  deleted  from  the  ends  of  the  viral 
LTRs,  unambiguously  indicating  that  the  proviruses  were 
generated  by  a  standard  retroviral-integration  process  [5]. 
The  existence  of  the  proviruses  and  sequences  corre¬ 
sponding  to  the  preintegration  sites  suggested  that  both 
alleles  exist  at  both  loci  in  humans  today. 

To  test  this,  a  PCR  strategy  (Figure  lb)  was  used  to 
search  for  both  alleles  in  samples  of  human  genomic  DNA. 
Reactions  using  primers  in  the  sequences  flanking  each 
provirus  can  distinguish  the  ancestral  preintegration  site, 
the  full-length  provirus,  and  the  solo  long  terminal  repeat 
(LTR)  that  can  form  by  homologous  recombination  be¬ 
tween  the  two  LTRs  of  a  provirus  [1].  Genotyping  of  a 
small  number  of  genetically  diverse  humans  showed  that 
HERV-K113  and  HERV-K115  are  each  present  in  some 
humans  but  not  others  (Figure  lc).  The  human  genomic 
DNAs  tested  included  individuals  of  sub-Saharan  African 
(Biaka  and  Mbuti),  Middle  Eastern  (Druze),  European, 
Chinese,  Melanesian  (Nasioi),  and  Native  American 
(Mayan)  origin,  plus  several  placental  DNAs  of  unknown 
ethnic  origin.  HERV-K1 13  was  present  in  9  of  31  individ¬ 
ual  samples  (29%),  including  two  Biaka  individuals  who 
were  homozygous  for  the  provirus  allele.  HERV-K115 
was  detected  in  5  of  31  samples  (16%).  No  solo  LTRs 
were  detected  at  either  of  these  loci  among  the  samples 
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Figure  1 


Detection  of  HERV-K1 13  and  HERV-K1 15  in 
human  genomic  DNA.  (a)  For  each  provirus, 
the  top  line  shows  the  sequence  of  the 
preintegration  site  allele  while  the  lower  line 
shows  the  sequence  of  the  proviral  allele.  The 
open  box  contains  the  target  site  that  was 
duplicated  upon  proviral  integration.  The  black 
box  indicates  the  viral  long-terminal  repeat 
(LTR),  with  five  nucleotides  at  the  ends  of  the 
LTRs  shown.  Positions  of  the  2  bp  deletions 
that  occurred  at  the  ends  of  the  LTRs  upon 
integration  are  indicated,  (b)  The  PCR 
strategy  used  to  detect  the  preintegration  site, 
the  provirus,  and  the  solo  LTR  is  shown. 

Black  boxes  represent  the  viral  LTRs,  and  open 
boxes  represent  the  target  site  duplications. 
Positions  and  orientations  of  the  PCR  primers 
(A,  B,  B',  C',  C,  and  D)  are  shown.  The 
product  generated  with  the  AD  primer  pair  is 
larger  from  the  solo  LTR  than  from  the 
preintegration  site  by  the  size  of  a  viral  LTR 
(^970  bp),  (c)  PCR  products  obtained  with 
the  indicated  primer  pairs  are  shown.  Only  the 
proviral  3'  junction  reactions  are  shown.  The 
same  results  were  obtained  with  the  proviral 
5'  junctions.  The  human  population  of  origin 
for  each  sample  is  indicated:  Euro,  European; 
Chin,  Chinese;  Nas,  Nasioi;  M,  size  marker. 
The  smaller  band  at  the  bottom  of  the  HERV- 
K1 1 5  A  +  D  reactions  is  unincorporated 
primers. 
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tested.  Excluding  the  placental  samples  that  may  have 
contained  three  alleles,  the  HERV-K113  provirus  allele 
frequency  was  0.19  (9/48)  in  24  diploid  samples  tested. 
There  may  be  considerable  variation  among  different  hu¬ 
man  populations.  The  HERV-K115  provirus  allele  fre¬ 
quency  was  0.04  (2/46)  in  23  diploid  samples  tested.  It  is 
curious  that  the  single  anonymous  donor  for  the  RP11 
library  was  heterozygous  for  both  proviruses.  Since  most 
human  allelic  variation  probably  arose  within  the  last 
1,000,000  years  [6-9],  both  HERV-K1 13  and  HERV-K1 15 
likely  formed  within  this  time  period. 

PCR  products  were  amplified  only  from  orthologous  loci 

We  considered  the  possibility  that  the  PCR  primers  used 
to  detect  the  preintegration  sites  might  amplify  sequences 
from  other  loci  in  the  human  genome  and  thus  yield 
erroneous  data  about  which  alleles  were  present.  Re- 
peatMasker  and  BLAST  analyses  showed  that  HERV- 
K113  was  inserted  in  a  785  bp  stretch  of  human  DNA 
sequence  between  a  SINE/AluY  element  and  a  LINE1/ 
MA2  element  that  was  part  of  a  large,  low-copy  repeat 
sequence.  Most  of  the  low-copy  repeats,  like  the  one 
containing  HERV-K1 13,  were  located  on  chromosome  19. 
Comparison  of  500  bp  immediately  flanking  the  provirus 
insertion  site  to  the  corresponding  stretches  in  the  other 
repeats  showed  that  all  were  less  than  90%  identical  to 
the  sequences  flanking  the  provirus.  In  contrast,  the  or¬ 


thologous  preintegration  site  on  BAC  RP11-678G14  had 
only  one  nucleotide  difference  from  the  sequences  flank¬ 
ing  the  provirus  over  the  same  500  bp  stretch.  HERV- 
K115  was  inserted  at  a  junction  between  ancient  SINE/ 
MIR  and  LINE/CR1  (L3)  elements.  No  other  sequences 
in  GenBank  showed  significant  similarity  to  those  flank¬ 
ing  HERV-K115,  except  for  the  three  BACs  containing 
the  preintegration  site.  The  four  sequences  differed  from 
each  other  at  three  positions  over  the  500  bp  immediately 
flanking  the  provirus,  and  these  probably  represent  single 
nucleotide  polymorphisms  among  humans.  In  summary, 
each  proviral  locus  has  a  sufficiently  unique  sequence  to 
be  distinguished  easily  from  any  other  locus  in  the  human 
genome.  The  sequences  of  the  preintegration  sites  in 
GenBank  matched  those  flanking  the  corresponding  pro¬ 
viruses  and  differed  substantially  from  those  of  any  other 
loci  in  the  human  genome. 

For  each  provirus,  the  proviral  junction  and  preintegration 
site  PCR  fragments  from  several  individuals  (Figure  lc) 
were  sequenced  and  found  to  match  precisely  the  sequences 
for  the  proviral  and  preintegration  site  alleles.  Thus,  the 
PCR  products  were  amplified  strictly  from  the  ortholo¬ 
gous  loci.  In  particular,  the  preintegration  site  PCR  prod¬ 
ucts  (A  +  D,  Figure  lc)  for  HERV-K1 13  were  not  derived 
from  any  other  low-copy  repeat  on  chromosome  19. 
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Figure  2 
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Genetic  structures  of  HERV-K1 13  and  HERV- 
K1 1 5  proviruses.  Positions  of  the  viral  open 
reading  frames  (ORFs)  including  ATG  start 
codons  and  stop  codons  are  shown,  c 
indicates  the  first  coding  exon  of  the  cORF/ 
KRev  protein.  SD  and  SA  indicate  splice 
donors  and  splice  acceptors.  The  triangle 
below  the  HERV-K1 15  genome  indicates  the 
position  of  the  1  bp  deletion  relative  to  other 
full-length  HERV-K  proviruses.  The  boxes 
below  the  map  of  each  provirus  depict  the 
three  possible  translational  reading  frames 
with  - 1  shifts  from  any  row  to  the  one  below 
it.  In  HERV-K1 15,  the  1  bp  deletion  near  the 
gag-pro  boundary  resulted  in  replacement  of 
the  normal  31  amino  acids  at  the  carboxyl 
terminus  of  the  Gag  precursor  protein  with  1 2 


amino  acids  from  what  is  the  +1  ORF  in  other 
HERV-Ks.  The  novel  sequences  are  shown 
as  the  very  small  black  box  at  the  C  terminus 
of  the  gag  ORF.  Even  though  the  pro 


and  pol  ORFs  of  HERV-K1 1 5  are  present,  they 
are  unlikely  to  be  translated  due  to  the  1  bp 
deletion  near  the  gag-pro  boundary  and  are 
thus  excluded  from  the  figure. 


Preintegration  site  alleles  were  not  generated 
by  deletion  of  proviruses  by  recombination 

The  sequences  of  the  preintegration  site  PCR  products 
(A  +  D,  Figure  lc)  also  argued  strongly  against  the  possi¬ 
bility  that  the  preintegration  site  alleles  were  generated 
by  replacement  of  an  existing  provirus  by  a  recombination 
event  such  as  gene  conversion  involving  a  nonorthologous 
locus  containing  sequences  similar  to  those  flanking  the 
provirus.  If  such  an  event  had  occurred,  sequences  for  at 
least  a  short  distance  on  either  side  of  the  provirus  would 
also  have  been  replaced,  and  the  preintegration  site  se¬ 
quence  would  be  expected  to  differ  from  the  sequences 
flanking  the  corresponding  provirus.  However,  this  was 
not  the  case.  The  sequences  of  the  amplified  products 
(A  +  D,  Figure  lc)  always  matched  those  of  the  sequences 
flanking  the  proviruses.  Thus,  it  is  highly  unlikely  that 
the  preintegration  site  allele  for  either  provirus  actually 
resulted  from  proviral  loss  due  to  a  recombination  event. 
In  summary,  the  PGR  analyses  (Figure  lc)  robustly  deter¬ 
mined  whether  the  authentic  preintegration  site  and  pro- 
viral  alleles  were  present  in  human  DNA  samples. 

Proteins  encoded  by  the  proviruses 

To  determine  the  coding  capacity  of  each  provirus,  the 
viral  genomes  were  sequenced  (Figure  2).  HERV-K113 
was  found  to  have  full-length  open  reading  frames  (ORFs) 
for  all  viral  proteins,  with  no  substitutions  that  would  alter 
amino  acid  sequence  motifs  that  are  well  conserved 
among  retroviruses.  This  distinguishes  HERV-K113  from 
HERV-K(HLM-2.HOM)  [also  called  HERV-K108  and 
HERV-K(C7)],  which  encodes  CIDD  instead  of  a  stan¬ 
dard  YIDD  motif  in  reverse  transcriptase,  and  from 
other  previously  described  HERV-Ks  [1,  3,  4,  10].  Thus, 
HERV-K113  is  the  first  endogenous  retrovirus  described 
in  humans  with  full-length  ORFs  for  all  viral  proteins  and 
no  amino  acid  substitutions  in  conserved  sequence  motifs. 

Relative  to  other  HERV-K  proviruses  [1],  HERV-K115 
has  a  1  bp  deletion  located  92  bp  upstream  from  the  stop 


codon  of  the  gag  ORF  (Figure  2).  This  mutation  alters 
the  carboxyl  terminus  of  the  encoded  Gag  precursor  pro¬ 
tein  and  alters  the  ribosomal  frameshift  required  to  trans¬ 
late  the  pro  and  pol  ORFs  [11]  from  the  standard  -1  of 
the  Betaretroviruses ,  the  genus  to  which  HERV-K  belongs 
along  with  the  related  mouse  mammary  tumor  virus  and 
type-D  primate  viruses,  to  +1.  Thus,  it  is  unlikely  that 
the  pro  and  pol  ORFs  can  be  translated  from  HERV- 
K115.  HERV-K115  does  encode  a  full-length  cORF / 
KRev  protein  that  functions  in  the  nuclear  export  of  viral 
RNA  [12,  13].  It  also  encodes  a  full-length  Env  protein. 
The  evolutionary  conservation  of  full-length  env  ORFs  in 
at  least  four  HERV-K  proviruses,  HERV-Ks  115,  113,  109, 
and  108/(HLM-2.HOM)/(C7)  (Figure  2  and  [1]),  strongly 
suggests  that  the  HERV-K  Env  protein  is  crucial  for  viral 
reinfection  of  germ  cells.  This  implies  that,  at  least  much 
of  the  time,  such  reinfections  involve  standard  retroviral 
particles  containing  functional  HERV-K  Env  protein  that 
enter  the  cells  via  a  cellular  receptor  for  the  virus. 

Estimation  of  proviral  ages  by  LTR  sequence 
comparisons 

The  relative  ages  of  endogenous  retroviruses  can  be  esti¬ 
mated  by  comparing  the  sequences  of  the  two  viral  long- 
terminal  repeats  (LTRs)  [14-18].  Due  to  the  mechanism 
of  reverse  transcription,  retroviral  LTRs  are  usually  iden¬ 
tical  at  the  time  when  a  provirus  forms.  Mutations  that 
subsequently  accumulate  over  evolutionary  time  are 
unique  to  one  of  the  two  LTRs.  Since  both  HERV-K113 
and  HERV-K115  are  full-length  proviruses,  LTR  se¬ 
quences  can  be  used  to  estimate  how  long  ago  they 
formed.  Besides  these  two  proviruses,  there  is  a  HERV-K 
solo  LTR  that  is  in  the  HLA-DQB1  locus  of  some  but 
not  all  humans  [2, 19].  However,  since  it  is  not  a  full-length 
provirus  with  two  LTRs,  this  type  of  analysis  cannot  be 
applied  to  estimate  its  age. 

The  two  LTRs  of  HERV-K115  had  14  differences:  13 
single  bp  substitutions  and  an  8  bp  deletion  at  115  bp 
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from  the  beginning  of  the  3'  LTR  (TCTGTTAATC 
TATGACCT,  deleted  bases  underlined).  Nine  of  these 
mutations,  including  the  8  bp  deletion,  were  previously 
observed  in  both  LTRs  of  multiple  other  HERV-K  provi¬ 
ruses  in  humans  [1].  Thus,  it  is  highly  unlikely  that  these 
represent  de  novo  mutations  in  HERV-K115.  The  most 
likely  explanation  for  them  is  that  they  arose  by  a  recombi¬ 
nation  between  two  HERV-K  genomes,  perhaps  a  gene 
conversion  event  involving  a  second  HERV-K  locus  in 
the  human  genome  and  one  of  the  HERV-K115  LTRs. 
Gene  conversion  of  endogenous  retroviral  loci  was  pre¬ 
viously  documented  to  have  occurred  [18].  Thus,  at  most 
only  five  of  the  differences  between  the  HERV-K115 
LTRs  likely  arose  by  mutation  over  evolutionary  time, 
and  even  some  of  those  might  have  been  introduced  by 
the  same  gene  conversion  event.  Thus,  estimation  of  the 
relative  age  of  HERV-K  1 15  based  on  LTR  sequence  dif¬ 
ferences  is  potentially  inaccurate,  although  a  total  of  five 
differences  probably  provides  an  upper  limit  for  estimat¬ 
ing  how  long  ago  it  formed. 

There  were  no  differences  between  the  two  LTRs  of 
HERV-K113.  This  is  consistent  with  HERV-K113  being 
a  very  new  provirus.  By  comparing  the  number  of  differ¬ 
ences  between  LTRs  of  individual  HERVs  and  the  dates 
of  divergence  of  the  most  distantly  related  species  con¬ 
taining  them,  it  was  estimated  that  endogenous  retrovi¬ 
ruses  accumulate  mutations  at  a  rate  of  roughly  2.3  X 
10~9  to  5  X  10”9  substitutions  per  site  per  year  [18].  That 
equals  one  difference  per  LTR  every  200,000-450,000 
years  (HERV-K  LTRs  are  about  970  bp  long).  Similarly, 
it  was  estimated  based  on  intronic  mutation  rates  that 
HERV-K(HLM-2.HOM),  which  has  accumulated  six  dif¬ 
ferences  between  its  LTRs,  formed  about  1.2  million 
years  ago  [3].  Using  these  values,  it  can  be  estimated  that 
HERV-K1 13,  which  has  no  differences  between  its  LTRs, 
likely  formed  more  recently  than  200,000-450,000  years 
ago  and  perhaps  considerably  more  recently. 

Implications  for  HERV-K  infectivity 

HERV-K  first  infected  the  germline  of  the  lineage  leading 
to  humans  roughly  35  million  years  ago,  sometime  after 
the  divergence  of  platyrrhines  (New  World  monkeys) 
from  catarrhines  (cercopithecoids  and  hominoids),  but  be¬ 
fore  the  separation  of  cercopithecoids  (Old  World  mon¬ 
keys)  from  hominoids  (apes  and  humans).  More  distantly 
related  sequences  have  been  described  for  platyrrhines 
[20].  The  existence  of  insertional  polymorphisms  of 
HERV-K113  and  HERV-K115  in  humans,  the  absence 
or  low  number  of  nucleotide  differences  between  the 
LTRs  of  the  proviruses,  and  the  presence  of  full-length 
ORFs  for  all  viral  proteins  in  HERV-K113  support  the 
idea  that  these  two  proviruses  were  very  recent  additions 
to  the  human  genome.  Thus,  HERV-K  reinfected  humans 
in  very  recent  evolutionary  times.  Neighbor-joining  and 
maximum-likelihood  analyses  showed  that  both  HERV- 


K113  and  HERV-K115  are  very  closely  related  to  other 
human-specific  HERV-K  proviruses.  Thus,  they  descended 
from  earlier  human  HERV-Ks  and  were  not  the  result  of 
recent  crossspecies  transmission.  Each  virus  was  detected 
in  at  least  one  sub-Saharan  African  and  in  at  least  one 
non-African.  This  is  consistent  with  the  proviruses  having 
originally  formed  in  two  individuals  in  Africa  prior  to  the 
emergence  of  modern  humans  from  Africa,  but  recently 
enough  that  neither  proviral  allele  was  fixed  in  the  rela¬ 
tively  small  human  population  at  that  time.  Evidence 
suggests  that  the  emergence  of  modern  humans  from 
Africa  began  perhaps  100,000  years  ago  [21,  22]. 

Analysis  of  the  human  genome  sequence  led  to  the  idea 
that  replication-competent  endogenous  retroviruses  in  the 
human  genome  may  be  extinct  or  very  nearly  so  [23]. 
However,  neither  HERV-K113  nor  HERV-K115  was 
previously  included  in  GenBank.  Moreover,  since  both 
HERV-K113  and  HERV-K115  were  initially  identified 
by  screening  just  a  single  individual  (RP11),  it  is  reason¬ 
able  to  hypothesize  that  there  may  be  additional,  more 
recently  acquired  HERV-K  proviruses  that  are  present  at 
even  lower  frequencies  among  humans  today.  Strong  data 
support  the  conclusion  that  HERV-K  has  been  infectious 
in  the  human  lineage  from  about  35  million  years  ago 
through  the  time  of  formation  of  HERV-K  1 13  which  may 
have  occurred  as  recently  as  within  the  last  100,000  years. 
Unless  the  virus  has  suddenly  lost  its  ability  to  replicate 
and  reinfect  the  human  germline  after  being  active  for 
about  35  million  years,  HERV-K  should  still  be  infectious 
in  humans  today.  Recent  HERV-K  infections  might  have 
occurred  by  complementation  involving  viral  proteins  en¬ 
coded  by  different  HERV-K  proviruses  in  the  human 
genome,  or  they  may  have  involved  individual  proviruses. 
HERV-K113  is  the  best  candidate  to  be  a  single  provirus 
that  is  active  in  humans  today. 

Materials  and  methods 

BAC  screening  and  PCR 

BAC  library  screening,  PCR  reactions  with  Taq  polymerase  for  products 
<1  kb,  and  PCR  with  Expand  Long  Template  PCR  System  (Boehringer- 
Mannheim)  for  products  >1  kb  were  performed  as  described  [1].  Amplifi¬ 
cation  of  genomic  sequences  flanking  proviruses  in  BACs  by  inverse 
PCR  was  performed  as  described  [1,  4,  24]. 

Supplementary  material 

Supplementary  material  including  primer  sequences,  GenBank  acces¬ 
sion  numbers,  BAC  addresses,  and  methods  of  DNA  sequence  analysis 
is  available  at  http://images.cellpress.com/supmat/supmatin.htm. 
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